A Statistical Framework for the Prediction of Fault-Proneness

نویسندگان

  • Yan Ma
  • Lan Guo
  • Bojan Cukic
چکیده

Accurate prediction of fault prone modules in software development process enables effective discovery and identification of the defects. Such prediction models are especially valuable for the large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a methodology for predicting fault prone modules using a modified random forests algorithm. Random forests improve classification accuracy by growing an ensemble of classification trees and letting them vote on the classification decision. We applied the methodology to five NASA public domain defect data sets. These data sets vary in size, but all typically contain a small number of defect samples in the learning set. For instance, in project PC1, only around 7% of the instances are defects. If overall accuracy maximization is the goal, then learning from such data usually results in a biased classifier, i.e. the majority of samples would be classified into non-defect class. To obtain better prediction of fault-proneness, two strategies are investigated: proper sampling technique in constructing the tree classifiers, and threshold adjustment in determining the winning class. Both are found to be effective in accurate prediction of fault prone modules. In addition, the paper presents a thorough and statistically sound comparison of these methods against ten other classifiers frequently used in the literature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

Using Source Code Metrics and Ensemble Methods for Fault Proneness Prediction

Software fault prediction model are employed to optimize testing resource allocation by identifying fault-prone classes before testing phases. Several researchers’ have validated the use of different classification techniques to develop predictive models for fault prediction. The performance of the statistical models are proven to be influenced by the training and testing dataset. Ensemble meth...

متن کامل

A particle filter and SVM integration framework for fault-proneness prediction in robot dead reckoning system

This paper proposes an integrated framework for fault prediction in the robot dead reckoning system. The integrated framework is built by particle filter and support vector machine (SVM). On the basis, the weighted fault probability parameters can be extracted to train the prediction model. Different from the traditional particle filter fault prediction model, the proposed framework can overcom...

متن کامل

Empirical Studies to Predict Fault Proneness: A Review

Empirical validations of software metrics are used to predict software quality in the past years. This paper provides a review of empirical studies to predict software fault proneness with a specific focus on techniques used. The paper highlights the milestone studies done from 1995 to 2010 in this area. Results show that use of machine learning languages have started.This paper reviews works d...

متن کامل

Prediction of Change-Prone Classes Using Machine Learning and Statistical Techniques

For software development, availability of resources is limited, thereby necessitating efficient and effective utilization of resources. This can be achieved through prediction of key attributes, which affect software quality such as fault proneness, change proneness, effort, maintainability, etc. The primary aim of this chapter is to investigate the relationship between object-oriented metrics ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005